Normalizing German and English Inflectional Morphology to Improve Statistical Word Alignment
نویسندگان
چکیده
German has a richer system of inflectional morphology than English, which causes problems for current approaches to statistical word alignment. Using Giza++ as a reference implementation of the IBM Model 1, an HMMbased alignment and IBM Model 4, we measure the impact of normalizing inflectional morphology on German-English statistical word alignment. We demonstrate that normalizing inflectional morphology improves the perplexity of models and reduces alignment errors.
منابع مشابه
Experiments with word alignment, normalization and clause reordering for SMT between English and German
This paper presents the LIU system for the WMT 2011 shared task for translation between German and English. For English– German we attempted to improve the translation tables with a combination of standard statistical word alignments and phrase-based word alignments. For German–English translation we tried to make the German text more similar to the English text by normalizing German morphology...
متن کاملThe Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
We have investigated the potential for improvement in target language morphology when translating into Swedish from English and German, by measuring the errors made by a state of the art phrase-based statistical machine translation system. Our results show that there is indeed a performance gap to be filled by better modelling of inflectional morphology and compounding; and that the gap is not ...
متن کاملModelling Linguistic Phenomena with Unsupervised Morphology for Improving Statistical Machine Translation
This work studies an ascetic approach to statistical machine translation. We assume that only a small parallel corpus is available, and no other monoor bilingual corpora or linguistic tools can be used, which is the case for many resource-scarce languages. Our aim is to find out how a baseline SMT system can be improved under this condition. In such a case one of the natural choices is to use u...
متن کاملDeeper than Words: Morph-based Alignment for Statistical Machine Translation
In this paper we introduce a novel approach to alignment for statistical machine translation. The core idea is to align subword units, or morphs, instead of word forms. This results in a joint segmentation and alignment model, aimed to improve translation quality for morphologically rich languages and reduce the size of the required parallel corpora. Here we focus on translating from inflection...
متن کاملEnglish-Latvian SMT: knowledge or data?
In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...
متن کامل